On Stochastic Tree Distances and Their Training via Expectation-Maximisation
نویسنده
چکیده
Continuing a line of work initiated in (Boyer et al., 2007), the generalisation of stochastic string distance to a stochastic tree distance is considered. We point out some hitherto overlooked necessary modifications to the Zhang/Shasha tree-distance algorithm for all-paths and viterbi variants of this stochastic tree distance. A strategy towards an EM cost-adaptation algorithm for the all-paths distance which was suggested by (Boyer et al., 2007) is shown to overlook necessary ancestry preservation constraints, and an alternative EM costadaptation algorithm for the Viterbi variant is proposed. Experiments are reported on in which a distanceweighted kNN categorisation algorithm is applied to a corpus of categorised tree structures. We show that a 67.7% base-line using standard unit-costs can be improved to 72.5% by the EM cost adaptation algorithm.
منابع مشابه
Using evolutionary Expectation Maximisation to estimate indel rates
Motivation: The Expectation Maximisation algorithm, in the form of the Baum-Welch algorithm (for HMMs) or the Inside-Outside algorithm (for SCFGs), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiplesequence evolutionary modeling, it would be useful to apply the EM algorithm to estimate not just the probability...
متن کاملParameter Estimation for Discrete-Time Nonlinear Systems Using EM
In this paper we consider parameter estimation of general stochastic nonlinear statespace models using the Maximum Likelihood method. This is accomplished via the employment of an Expectation Maximisation algorithm, where the essential components involve a particle smoother for the expectation step, and a gradient-based search for the maximisation step. The utility of this method is illustrated...
متن کاملInducing Sound Segment Differences Using Pair Hidden Markov Models
Pair Hidden Markov Models (PairHMMs) are trained to align the pronunciation transcriptions of a large contemporary collection of Dutch dialect material, the GoemanTaeldeman-Van Reenen-Project (GTRP, collected 1980–1995). We focus on the question of how to incorporate information about sound segment distances to improve sequence distance measures for use in dialect comparison. PairHMMs induce se...
متن کاملDiffused expectation maximisation for image segmentation - Electronics Letters
Diffused expectation maximisation is a novel algorithm for image segmentation. The method models an image as a finite mixture, where each mixture component corresponds to a region class and uses a maximum likelihood approach to estimate the parameters of each class, via the expectation maximisation algorithm, coupled with anisotropic diffusion on classes, in order to account for the spatial dep...
متن کاملA Metropolis version of the EM algorithm
The Expectation Maximisation (EM) algorithm is a popular technique for maximum likelihood in incomplete data models. In order to overcome its documented limitations, several stochastic variants are proposed in the literature. However, none of these algorithms is guaranteed to provide a global maximizer of the likelihood function. In this paper we introduce the MEM algorithm — a Metropolis versi...
متن کامل